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In the global world we live in, free and open-source software (FOSS)"” is increasingly becoming a 
good option, not only for public administrations and companies, but also for general users. 


This article is a personal testimony of my experience with free and open-source software over the last 
5 years, both as a user and as a tester of Language Applications. 


Can FOSS be a translator's good friend? 


If some years ago open-source software was difficult to install and use and required substantial IT 
knowledge, nowadays it is easy and user-friendly and generally less resource-hungry. Besides, most of 
the times there is lots of information — including videos — that guide new users step by step. 


In the field of Language Applications, translators have now open-source Computer Assisted 
Translation (CAT) tools” and can also explore the potential of “tailor-made” Machine Translation 
(MT)” with open-source software. 


I will concentrate on one CAT tool — Omega™, a multiplatform application — and one MT toolkit — 
Moses’, a sophisticated and state of the art Statistical Machine Translation System which powers 


many MT services all over the world — via a small and pioneering project: Moses for Mere Mortals”. 


OmegaT is an open-source CAT tool originally developed by Keith Godfrey in 2000 which has since 
then been vastly improved and is currently being improved by developers led by Didier Briel. 


OmegaT is also the basis of the open-source Autshumato Translation Suite”, a project funded by the 
Department of Arts and Culture of South Africa and developed and managed by the Centre for Text 
Technology at the North-West University of South Africa. 


Moses for Mere Mortals (MMM) was born in 2009 out of a personal initiative of 3 DGT translators — 
Joao Rosas (developer) and Hilario Leal Fontes and myself (as testers) — who wanted to experiment 
with what was then an MT prototype almost only known to experts — Moses. 


The MMM scripts were first made available as open-source software in November 2009 because we 
believe in sharing and because we think that translators should not feel threatened by Machine 


Translation (MT), but rather understand its advantages and limitations and integrate it in their 
workflow as another CAT tool to help them in their work. 


Furthermore translators — who after all provide the raw materials in the form of high-quality human 
translation that make Statistical Machine Translation possible — should have the possibility to 
leverage their translation memories and “master” the use of MT with computational means as 
affordable as a 1000 Euro computer or below... and with an acceptable amount of time and effort. 


The fact is that, with these open-source applications, translators can choose to be independent and 
have full control over the tools of their trade! 


But before going into practical details, let’s have a bit of history about the open source movement — 
and its philosophy — and about free and open-source software and the present situation concerning 
Language Applications. 


The open source movement and the Linux operating system 


The open-source movement" started in the late 1970s in the USA with Richard Matthew Stallman", 
a pioneer and activist who in 1983 launched the GNU Project with the objective of creating a 
Unix-like computer operating system composed entirely of free software. 


The open-source movement is a broad-reaching movement of individuals who support the use of open 
source licences. Its philosophy is based on: “J — The freedom to use the software for any purpose; 2 — 
The freedom to change the software to suit your needs; 3 — The freedom to share the software with 
friends and neighbours; 4 — The freedom to share the changes you make’. 
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Almost two decades later, in the early 1990s, Linus Torvalds 
student), developed a budding new operating system — Linux 


, a Finnish computer scientist (then a 
— which was first released in 1991. 
Linux — with its various distributions“? — is what can be called an amazing success story and has 
evolved and grown over the last two decades with the collaboration of many developers around the 
world. It is now managed by the Linux Foundation” and Linus Torvalds continues to be deeply 
involved in the Linux kernel as Project Coordinator. 


What is not widely understood is that open-source software is, in fact, much more a part of our life 
than we think. We use it everyday in applications when we access the Internet or communicate via our 
cell phones... without being aware of it. More and more Linux is also finding its way into modern 
appliances from high-tech fridges and smart TV’s to luxury cars. 


Another amazing fact is that 485 of the world’s top-500 supercomputers run Linux, which represents a 
massive 97 %"”. 


It is also interesting to note that in the last years national administrations all over the world — driven, 
not only but also, by budgetary constraints and confidentiality requirements — are migrating to Linux 
and open-source software in general". 


In the words of: 


NEELIE DE KROES, European Commission Vice-President and Commissioner for the Digital 
Agenda (2010-2014): Video (2010): “Open-source is not a dirty word anymore. It used to be a dirty 
word, a scary one. For example, in e-government applications and public administrations. 
Open-source was something to be used at home, as a hobby. Nobody would dare to push for an 
open-source solution in a public organisation. Who will offer us technical support? And what if we are 
sued for infringing somebody's IP if we use this?... Thankfully attitudes are changing! ... The reason is 
not only good value for money, which is critical in today's financial situation, but also more choice. 


There is lower dependence on certain vendors and lower switching costs. ...Things are changing also 
in the private sector. Now large companies declare proudly that they are not only using open-source 
solutions but also contributing to it.” 


OBAMA ADMINISTRATION website (2012): “Open-Source and the Power of Community: On 
open source projects, programmers build tools to solve specific problems, then make those tools freely 
available so others can use them and contribute their own improvements. The communities of 
programmers that grow up around successful open source projects often produce tools that are more 
secure, flexible, and cost-effective than those produced by a team working in isolation.”~” 


“We believe in using and contributing back to open source software as a way of making it easier for 
the government to share data, improve tools and services, and return value to taxpayers”(2014)°”, 


PHILIP KOEHN”, Associate Professor in the University of Edinburgh and Coordinator of the EU 
co-funded MOSESCORE research project (2006): “There are several reasons to create an open 
research environment by opening up resources (tools and corpora) freely to the wider community. 
Since our research is largely publicly funded, it seems appropriate to return the products of this work 
to the public. Access to free resources enables other research group to advance work that was started 
here, and provides them with baseline performance results for their own novel efforts. 


While these are honourable goals, our motivation for creating this toolkit is also somewhat 
self-interested: Building statistical machine translation systems has become a very complex task, and 
rapid progress in the field forces us to spend much time reimplementing other researchers’ advances in 
our system. By bringing several research groups together to work on the same system, this duplication 
of effort is reduced and we can spend more time on what we would really like to do: Come up with new 
ideas and test them.”°» 


The European Commission and open-source software 


The European Commission has been very active in this domain promoting the development and use of 
open-source software, namely creating the European Union Public Licence (EUPL)™ and the JoinUp 
Initiative (Open Source Observatory). 


The JoinUp “is a platform for exchanging information, experiences and Free Libre Open Source 
Software-based code for use in public administrations. This community aims to help public 
administrations share such software solutions, discuss good practice and exchange their 
experiences”, 


The European Commission also promotes the development of open-source software within its 
Framework Programmes on Research and Technological Development, notably in the field of 
Language Technologies” 


It was with co-funding from the 7th RTD Framework Programme that the Moses Machine Translation 
system °* was developed and made available as open-source software and it is also with funding from 
this Framework Programme that projects like MATECAT °? — Project Coordinator: Marcello 
Federico °° — and CASMACAT °! — Project Coordinator: Philipp Koehn — aim to develop a new 
generation of CAT tools with a high level of interactivity (online learning) and integration between 
translation memories, machine translation and terminology databases. 


Furthermore, because good quality data is important for Language Applications, the EU institutions 
have made available — in the europa.eu portal — corpora in all the EU official languages. From 2007 
onwards, the Commission Joint Research Centre has published DGT aligned corpora of EU legislation 
with more than 10 million segments °”, thereby completing the circle needed to train MT engines. 


The European Commission is also supporting META-NET®” - the Multilingual Europe Technology 


Alliance Network — as the availability and integration of digital language resources of all kinds - 
namely translation memories and terminology databases°”) — will be a key factor in the survival of 
languages in the Digital Age ... and therefore in the survival of translators! 


The META-NET White Papers®” give an overview of how the EU languages fare in the Digital Age. 
Of particular interest for Portugal, the Portuguese Language in the Digital Age White Paper? — by 
Anténio Branco®” et al. — should make us aware that there is (lots of) work to be done! 


DGT and open-source Language Applications®® 


Concerning Machine Translation, DGT has been a long-standing user since the mid-1970s with a 
rule-based commercial system. An English-Portuguese language pair existed since the mid-1980s and 
a French-Portuguese language pair was created in the late 1990s. The Portuguese Language 
Department was one of the Departments that significantly used MT since 2000 and invested a lot in its 
improvement for about a decade. 


However, rule-based MT is expensive to develop and improve and each language pair requires a 
significant effort in terms of human resources and time. In 2010, DGT only offered 18 language 
combinations (+10 as prototypes), none of which for the new languages of the 2004 EU Enlargement. 


Therefore, in 2010 a new MT Action Plan was approved to create a new MT service — MT@EC®” — 
based on the open-source Moses system. With a team led by Andreas Eisele“”, MT Project Manager 
and former researcher in the EuroMatrixPlus Project, the service became available in 2011°”. 


MT@EC™ is meant to serve the needs of the European Commission and the EU institutions in 
general and can also be used by the EU national administrations (including universities)“. 


MTG@EC offers 552 language combinations between all the EU official languages (62 in direct mode 
and the others via a pivot language). 


Concerning CAT tools, DGT had been using a commercial product since the late 1990s and in 2012 
acquired a new commercial application. However, DGT has also been using an open-source CAT tool 
— OmegaT — for prototyping purposes, which is also available to DGT translators who want to use 
it. 


How can FOSS be a translator's good friend? 


In the context briefly described above, where does an individual translator — or a small group of 
translators — stand in terms of CAT tools and Machine Translation? 


In terms of CAT tools, there is a large variety of commercial applications and some open-source 
applications and it is interesting to see what translators think of the options available“. 


In terms of Machine Translation, there is quite a choice between commercial systems and also services 
and platforms that can be used — either freely or at a (low) cost. 


Specifically in terms of MT open-source software, there are 2 main systems for those who want to 
develop their own MT engines and have control of the whole process: Apertium (rule-based)“® and 
Moses (data-driven). 


CAT tool — OmegaT 
OmegaT is easy to install and its Guide (Help)“” gives very detailed explanations. There are also 


videos on the Internet that are very useful, both to start working with OmegaT and to explore its more 
advanced features“, and its User Group™ is very active and helpful. 


In DGT, OmegaT is being used since 2012 for prototyping. For this purpose, the 2.6.0 version of 
OmegaT was customised and extended to integrate other DGT tools. Furthermore, DGT developed 
in-house an OmegaT Project Wizard to integrate OmegaT in its workflow. In 2012, OmegaT and its 
Project Wizard were made available to all translators interested in using it. 


I was tester of both OmegaT and the commercial CAT tool acquired by DGT and I liked OmegaT so 
much that I have been using it for more than 2 years. In the Portuguese Language Department, about 
25% of the translators are now using OmegaT either as their main CAT tool or for some translation 
projects. 


Because DGT OmegaT has some variations compared to the public version, I wrote a Quick Guide 
and a Guide to take into consideration DGT-OmegaT’s internal improvements or adaptations to our 
workflow and the OT Project Wizard. 


As in DGT we are privileged to have an IT Unit that takes care of the technical details — installation, 
plugins, scripts, defaults, compatibility etc. — these Guides focus purely on the translation process. 
They reflect the way I use OmegaT to translate documents in Office format — which are the huge 
majority of documents translated in DGT — and are not meant to cover every feature and possible use 
of OmegaT. 


As I believe in sharing — and even though there are features that are specific to DGT — I make them 
available as a complement to this article: DGT-OmegaT and its Wizard — Quick Guide® and 
DGT-OmegaT and its Wizard — A Translator’s Guide®’. 1 hope some translators will find them 
helpful. 


Machine Translation — Moses and Moses for Mere Mortals 


Moses is a Statistical Machine Translation system developed within the EU-funded EuroMatrix(Plus) 
Project and is now coordinated within the MOSESCORE Project under the guidance of Philipp 
Koehn®”. 

In the 2011 article: Will there be a thousand Moses MT systems?®”, Achim Ruopp” expressed a wish: 
“Td love for a thousand Moses systems to bloom, and am quite confident that with the various use 
case scenarios becoming easier overtime, many more than a thousand will blossom.” 


In 2014, I think it is safe to say that his wish has come true and that Moses is the basis of many MT 
systems used in all kinds of contexts. 


What is more is that Machine Translation, besides being freely available on the Internet — although 
with some reservations for some/many translation projects — is also within reach of absolute 
beginners with unsophisticated home computers and basic IT skills. 


With applications like Moses for Mere Mortals — which interface with the highly sophisticated and 
complex SMT Moses — MT can be made “understandable” to users without a degree in computation 
linguistics or computer science. 


Either with their own translation memories or with the corpora made available by the EU institutions 
— or combining both — translators can now use this real-world translation chain, at least for any EU 
official language pair. Language pairs can be trained in a domestic computer in a few hours or days, 
depending on the size of the corpus and the computer capacity. 


Output from MMM — trained with DGT corpora in a 1000 Euro computer — was used by the 
Portuguese Language Department between early 2010 and mid-2011 before the new MT @EC became 
available. 


MMM has been the subject of an article in «a folha» with a Case Study with EN-PT testing corpora®” 


which presents the kind of results that can be obtained even with unsophisticated resources. 


MMM is also used in the Swiss-based Olanto Foundation’ MyMT as a Translation (Back-end) Server 
in its CAT Suite®”. 

In 2013, within the DGT’s Visitor Translators’ Scheme®”, my colleague Hildrio Leal Fontes®® gave 
training on Moses for Mere Mortals to students and teachers of the Portuguese Porto®” and Minho 
universities. To my knowledge, Moses — via the MMM scripts — was directly trained and used with 
specific corpora in 2 Master's thesis, along with other publicly available Machine Translation systems. 


This just shows that it is possible to build a “home-grown” variety of Machine Translation ... and that 
where there is a will there is a way! 


How to contribute to the FOSS movement 


It is interesting to note that — in the field of FOSS Language Applications — Europeans have been 
leaders and key contributors, supporting and funding research projects or developing, by private 
initiative, open-source applications. But of course FOSS knows no frontiers and there are 
contributions from all over the world. 


The fact is that using FOSS is not merely a matter of saving money — using “cheap” software —,, it is 
a matter of sharing the best possible software developed in open cooperation. 


You can contribute in a number of ways: by spreading the word, using it, contributing with your 
experience — be it in a video, a guide, helping others, giving suggestions for improvements —, 


sharing code or making a donation. 


Nothing in this life can be taken for granted and FOSS is no exception. The FOSS movement needs to 
be nurtured and supported to survive and the Internet — and all of us — make it all possible! 


maria.machado @ec.europa.eu 
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